In this tutorial we will introduce the packages required for geospatial work in R, show how to read in a few spatial data types, demonstrate several ways to calculate landscape metrics, and briefly touch on spatial statistics. It is assumed that you have R and RStudio installed and that you, at a minimum, understand the basic concepts of the R language (e.g. you can work throughR For Cats).

Core Spatial Packages

Required packages (the good old days)

Within in the last several years there has been a lot of effort spent on adding spatial data handling and analysis capability to R. Thanks to the very significant effort of the package authors we now have the foundation for doing GIS entirely within R.

The four packages that provide this foundation are:

Let’s dig in a bit deeper on each of these.

sp

The sp package provides the primary spatial data structures for use in R. Many other packages assume your data is stored as one of the sp data structures. Getting into the details of these is beyond the scope of this workshop, but look at the introduction to sp vignette for more details. That being said, we will be working mostly with SpatialPointsDataFrame and SpatialPolygonsDataFrame. More on that later.

Getting sp added is no different than adding any other package that is on CRAN.

install.packages("sp")
library("sp")

rgdal

The rgdal package provides tools for reading and writing multiple spatial data formats. It is based on the Geospatial Data Abstraction Library (GDAL) which is a project of the Open Source Geospatial Foundation (OSGeo). The primary use of rgdal is to read various spatial data formats into R and store them as sp objects. In this workshop, we will be only using rgdal to read in shape files, but it has utility far beyond that.

As before, nothing special to get set up with rgdal on windows. Simply:

install.packages("rgdal")
library("rgdal")

Getting set up on Linux or Mac requires more effort (i.e. need to have GDAL installed).

raster

Although sp and rgdal provide raster data capabilities, they do require that the full raster dataset be read into memory. This can have some performance implications as well as limits the size of datasets you can readily work with. The raster package works around this by working with raster data on the disk. That too has some performance implications, but for the most part, in my opinion anyway, raster makes it easier to work with raster data. It also provides several additional functions for analyzing raster data.

To install, just do:

install.packages("raster")
library("raster")

rgeos

The last of the core packages for doing GIS in R is rgeos. Like rgdal, rgeos is a project of OSgeo. It is a wrapper around the Geometry Engine Open Source c++ library and provides a suite of tools for conducting vector GIS analyses.

To install

install.packages("rgeos")
library("rgeos")

For Linux and Mac the GEOS library will also need to be available. As with rgdal this is a bit beyond the scope of this tutorial.

Cutting edge: sf

Things are changing quickly in the R/spatial analysis world and the most fundamental change is via the sf package. This package aims to replace sp, rgdal, and rgeos. There are a lot of reasons why this is a good thing, but that is a bit beyond the scope of this tutorial. Suffice it to say it should make things faster and simpler!

To install sf:

install.packages("sf")
library("sf")

Exercise 1

The first exercise won’t be too thrilling, but we need to make sure everyone has the packages installed.

1.) Install sp and load sp into your library. 2.) Repeat, with sf,rgdal, raster, and rgeos.

Interacting with an external GIS

Although we won’t be working with external GIS in this tutorial, there are several packages that provide ways to move back and forth from another GIS and R. For your reference, here are some.

  • spgrass6: Provides an interface between R and GRASS 6+. Allows for running R from within GRASS as well as running GRASS from within R.
  • rgrass7: Same as spgrass6, but for the latest version of GRASS, GRASS 7.
  • RPyGeo: A wrapper for accessing ArcGIS from R. Utilizes intermediate python scripts to fire up ArcGIS. Hasn’t been updated in some time.
  • RSAGA: R interface to the command line version of SAGA GIS.

Spatial analysis packages

There are many packages for doing various types of spatial analysis in R. For this tutorial we will look at only a few

SDMTools

This package is a large suite of tools for species distribution modelling. As part of that suite, the FRAGSTATS metrics are inlcuded.

spatstat

This is a huge package and will take care of pretty much all of your point pattern analysis needs.

spdep

Also another huge package for spatial analysis. This one is dense, but has all of your autocorrelation/variogram functions!

Exercise 2

  1. Install and load SDMTools, spatstat, and spdep.

Reading in spatial data

sp

Reading in data with sp requires rgdal which supports many different data formats. For this quick tutorial lets see an example of reading in a shapefile in the current working directory.

rand <- readOGR(".","rand")
## OGR data source with driver: ESRI Shapefile 
## Source: ".", layer: "rand"
## with 100 features
## It has 1 fields
plot(rand)

raster

To read in raster datasets we can use the raster packages raster() function. For most raster formats it is simply a matter of telling raster() to file location.

nlcd <- raster("nlcd.tif")
plot(nlcd)

sf

Lastly, reading in data with sf is also pretty straightforward.

sf_rand <- st_read("rand.shp")
## Reading layer `rand' from data source `/data/jhollist/r_landscape_tutorial/rand.shp' using driver `ESRI Shapefile'
## Simple feature collection with 100 features and 1 field
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -73.41471 ymin: 41.51605 xmax: -72.57539 ymax: 42.41291
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs

One of the benefits of using sf is the speed. In my tests it is about twice as fast. Let’s look at a biggish shape file with 1 million points!

1 million points

1 million points

#The old way
system.time(readOGR(".","big"))
## OGR data source with driver: ESRI Shapefile 
## Source: ".", layer: "big"
## with 1000000 features
## It has 1 fields
##    user  system elapsed 
##  11.565   0.225  11.797
#The sf way
system.time(st_read("big.shp"))
## Reading layer `big' from data source `/data/jhollist/r_landscape_tutorial/big.shp' using driver `ESRI Shapefile'
## Simple feature collection with 1000000 features and 1 field
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -71.03768 ymin: 41.05976 xmax: -69.09763 ymax: 43.00856
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
##    user  system elapsed 
##   5.938   0.101   6.043

Exercise 3

  1. Read in the rest of the example shapefiles with readOGR(). These are unif.shp and clumped.shp. Name them unif and clumped, respectively.

Landscape Metrics

Landscape metrics is a pretty big topic and a HUGE number of metrics can be calculated. There are of course many ways to do that, but, in my experience, the most common has been to use FRAGSTATS, a stand alone program that has seen many iterations since it was first released as a series of AML amd C (I’m dating myself) tools back in 1994. A stand-alone R implementation has not been created but most of the metrics have been included in the SDMTools package. Not sure how much active development is going on with SDMTools (it was last updated in August of 2014) but the code should be relatively stable. The primary purpose for SDMTools is species distribution modelling, but it does have two functions for caluclating landscape metrics: PatchStat() and ClassStat().

Before we get started with those, make sure SDMTools is installed and loaded/

install.packages("SDMTools")
library("SDMTools")

Class Metrics

Let’s start with the class level landscape metrics. For this all we need to do is point to the land cover data (it is expecting a raster and supports multiple data structures).

nlcd_class_metrics <- ClassStat(mat = nlcd, cellsize = 30)
dplyr::tbl_df(nlcd_class_metrics)
## # A tibble: 15 × 38
##    class n.patches total.area prop.landscape patch.density total.edge
## *  <dbl>     <int>      <dbl>          <dbl>         <dbl>      <dbl>
## 1     11       764   81593100    0.027556597  2.580272e-07    1235460
## 2     21     11461  191439000    0.064655068  3.870746e-06   12137340
## 3     22     14413   84519000    0.028544767  4.867731e-06    6488400
## 4     23      6765   36191700    0.012223094  2.284757e-06    2674140
## 5     24      1578    8551800    0.002888216  5.329410e-07     542460
## 6     31       272    7546500    0.002548694  9.186309e-08     264900
## 7     41      4610 1650803400    0.557529058  1.556944e-06   27957420
## 8     42      7639  180539100    0.060973823  2.579934e-06    8413320
## 9     43      9895  310977900    0.105027174  3.341858e-06   14941260
## 10    52      2805   30772800    0.010392958  9.473381e-07    1951800
## 11    71       683   10901700    0.003681852  2.306709e-07     533820
## 12    81      2632  147875400    0.049942248  8.889105e-07    4828560
## 13    82       404   19965600    0.006743021  1.364437e-07     626520
## 14    90      5159  184133700    0.062187834  1.742359e-06    6657780
## 15    95       993   15117300    0.005105595  3.353678e-07     776760
## # ... with 32 more variables: edge.density <dbl>,
## #   landscape.shape.index <dbl>, largest.patch.index <dbl>,
## #   mean.patch.area <dbl>, sd.patch.area <dbl>, min.patch.area <dbl>,
## #   max.patch.area <dbl>, perimeter.area.frac.dim <dbl>,
## #   mean.perim.area.ratio <dbl>, sd.perim.area.ratio <dbl>,
## #   min.perim.area.ratio <dbl>, max.perim.area.ratio <dbl>,
## #   mean.shape.index <dbl>, sd.shape.index <dbl>, min.shape.index <dbl>,
## #   max.shape.index <dbl>, mean.frac.dim.index <dbl>,
## #   sd.frac.dim.index <dbl>, min.frac.dim.index <dbl>,
## #   max.frac.dim.index <dbl>, total.core.area <dbl>,
## #   prop.landscape.core <dbl>, mean.patch.core.area <dbl>,
## #   sd.patch.core.area <dbl>, min.patch.core.area <dbl>,
## #   max.patch.core.area <dbl>, prop.like.adjacencies <dbl>,
## #   aggregation.index <dbl>, lanscape.division.index <dbl>,
## #   splitting.index <dbl>, effective.mesh.size <dbl>,
## #   patch.cohesion.index <dbl>

Patch Metrics

Calculating the patch metrics is a bit more involved becuase it is looking for a binary raster. So if we want to look at the metrics for all forest patches we need to pull those out of our landcover. Raster works well with logical statements and will return the cells that match as another raster. We can string those together with soem “or’s” to get all forest patches. Additionally, we want to get each individual patch labeled. We can so that with the SDMTools function ConnCompLabel().

nlcd_forest <- ConnCompLabel(nlcd==41|nlcd==42|nlcd==43)
plot(nlcd_forest)

Then to calculate the individual patch metrics.

nlcd_forest_patch_metrics <- PatchStat(mat = nlcd_forest, cellsize = 30)
dplyr::tbl_df(nlcd_forest_patch_metrics)
## # A tibble: 1,785 × 12
##    patchID n.cell n.core.cell n.edges.perimeter n.edges.internal      area
##      <int>  <int>       <int>             <int>            <int>     <dbl>
## 1        0 909564      451960            624262          3013994 818607600
## 2        1   4965        3816              1266            18594   4468500
## 3        2  49782       39889             10636           188492  44803800
## 4        3  39939       31968              8612           151144  35945100
## 5        4    530         335               214             1906    477000
## 6        5  30686       27456              3458           119286  27617400
## 7        6     35          11                30              110     31500
## 8        7      2           0                 6                2      1800
## 9        8    887         410               580             2968    798300
## 10       9      1           0                 4                0       900
## # ... with 1,775 more rows, and 6 more variables: core.area <dbl>,
## #   perimeter <dbl>, perim.area.ratio <dbl>, shape.index <dbl>,
## #   frac.dim.index <dbl>, core.area.index <dbl>

And another example for water.

nlcd_water <- ConnCompLabel(nlcd==11)
plot(nlcd_water)

Then the metrics.

nlcd_water_patch_metrics <- PatchStat(mat = nlcd_water, cellsize = 30)
dplyr::tbl_df(nlcd_water_patch_metrics)
## # A tibble: 765 × 12
##    patchID  n.cell n.core.cell n.edges.perimeter n.edges.internal
##      <int>   <int>       <int>             <int>            <int>
## 1        0 3199261     3149797             48188         12748856
## 2        1       5           0                12                8
## 3        2      11           0                18               26
## 4        3       4           0                10                6
## 5        4       1           0                 4                0
## 6        5       3           0                 8                4
## 7        6      10           0                16               24
## 8        7      11           0                20               24
## 9        8      16           2                20               44
## 10       9      52          19                42              166
## # ... with 755 more rows, and 7 more variables: area <dbl>,
## #   core.area <dbl>, perimeter <dbl>, perim.area.ratio <dbl>,
## #   shape.index <dbl>, frac.dim.index <dbl>, core.area.index <dbl>

Roll your own

While there are lots of things included in the SDMTools functions there are also many other types of metrics that may not be included. In that case you might want to roll your own metrics. At this point, that is beyond the scope of this tutorial. Check back in the future!

Exercise 4

  1. Calculate patch statistics for the “developed” classes. The NLCD classes are listed on the MRLC Website.

Spatial Statistics

The second big section on landscape ecology data analyis relates to spatial statistics. In this tutorial we will cover point pattern analysis, autocorrelation and variography, and spatial interpolation

Point Pattern Analysis

Point pattern analysis is supported by many packages but the one that seems to have the most complete functionality (based on my VERY limited experience working with point pattern in R) is spatstat. If you are looking for a more complete review of point pattern analysis in R, the FWS has a good tutorial.

One issue we will need to deal with is that many of the packages for spatial analysis preceeded the formal definitions of spatial data in R (e.g. sp, raster, and sf) thus they have their own data formats. We will need to manipulate the data that we have read into R and create a ppp object. Below shows examples for the sp and sf objects.

First lets convert one of the sp objects. The package maptools currently provides many of these conversions via the as functions for sp objects.

library(maptools)
#coordinates(rand) = ~x+y
rand_ppp <- as(rand,"ppp")
plot(rand_ppp)

Since sf is so new their aren’t many of these conversion functions yet available. We can roll our own and create the ppp object.

sf_rand_coords <- matrix(unlist(sf_rand$geometry), ncol = 2, byrow = T)
sf_rand_owin <- owin(st_bbox(sf_rand)[c(1,3)],st_bbox(sf_rand)[c(2,4)])
sf_rand_ppp <- ppp(x = sf_rand_coords[,1], y = sf_rand_coords[,2],
                   window = sf_rand_owin, check = T)
plot(sf_rand_ppp)

The hardest part of this is to make the conversion of our data structures. Now that we have the ppp object we can start to analyze the data.

Countour plot

First, some exploratory visualization. Nice, but I need to think about what this is showing us…

contour(density(rand_ppp))

Quadrat count

We can work on a count of points per quadrats

rand_qc <- quadratcount(sf_rand_ppp, 3,3)
plot(rand_qc)
plot(sf_rand_ppp, add = T, cex = 0.5, col = "grey")

Ripley’s K

And lastly, we can look at the Ripley’s K plot as it was presented to us in the Turner and Gardner book.

rand_pp_k <- Kest(rand_ppp)
plot(rand_pp_k)

Exercise 5

  1. Convert the other sp objects, unif and clumped, into ppp objects.
  2. Plot out quadratcount object for both. Use a 3X3 quadrat design.
  3. Plot the Ripley’s K estimates for both objects as well.

Autocorrelation and Variography

Moran’s I - Testing Residuals

#From the spdeb help
library(spdep)
## Loading required package: Matrix
data(oldcol)
oldcrime.lm <- lm(CRIME ~ HOVAL + INC, data = COL.OLD)
lm.morantest(oldcrime.lm,nb2listw(COL.nb, style = "W"))
## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = CRIME ~ HOVAL + INC, data = COL.OLD)
## weights: nb2listw(COL.nb, style = "W")
## 
## Moran I statistic standard deviate = 2.9539, p-value = 0.001569
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I      Expectation         Variance 
##      0.235638354     -0.033302866      0.008289408

Moran’s I for a raster

Moran(nlcd)

Variogram

Spatial Interpolation

Inverse Distance Weighting

Splines

Kriging